27 research outputs found

    Parametric inference in the large data limit using maximally informative models

    Get PDF
    Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference. When exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that, in the large data limit, this need for a pre-characterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M;R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the Data Processing Inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions "diffeomorphic modes" and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio

    Equitability, mutual information, and the maximal information coefficient

    Get PDF
    Reshef et al. recently proposed a new statistical measure, the "maximal information coefficient" (MIC), for quantifying arbitrary dependencies between pairs of stochastic quantities. MIC is based on mutual information, a fundamental quantity in information theory that is widely understood to serve this need. MIC, however, is not an estimate of mutual information. Indeed, it was claimed that MIC possesses a desirable mathematical property called "equitability" that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect. First we offer mathematical proof that no (non-trivial) dependence measure satisfies the definition of equitability proposed by Reshef et al.. We then propose a self-consistent and more general definition of equitability that follows naturally from the Data Processing Inequality. Mutual information satisfies this new definition of equitability while MIC does not. Finally, we show that the simulation evidence offered by Reshef et al. was artifactual. We conclude that estimating mutual information is not only practical for many real-world applications, but also provides a natural solution to the problem of quantifying associations in large data sets

    Kerfuffle: a web tool for multi-species gene colocalization analysis

    Get PDF
    The evolutionary pressures that underlie the large-scale functional organization of the genome are not well understood in eukaryotes. Recent evidence suggests that functionally similar genes may colocalize (cluster) in the eukaryotic genome, suggesting the role of chromatin-level gene regulation in shaping the physical distribution of coordinated genes. However, few of the bioinformatic tools currently available allow for a systematic study of gene colocalization across several, evolutionarily distant species. Kerfuffle is a web tool designed to help discover, visualize, and quantify the physical organization of genomes by identifying significant gene colocalization and conservation across the assembled genomes of available species (currently up to 47, from humans to worms). Kerfuffle only requires the user to specify a list of human genes and the names of other species of interest. Without further input from the user, the software queries the e!Ensembl BioMart server to obtain positional information and discovers homology relations in all genes and species specified. Using this information, Kerfuffle performs a multi-species clustering analysis, presents downloadable lists of clustered genes, performs Monte Carlo statistical significance calculations, estimates how conserved gene clusters are across species, plots histograms and interactive graphs, allows users to save their queries, and generates a downloadable visualization of the clusters using the Circos software. These analyses may be used to further explore the functional roles of gene clusters by interrogating the enriched molecular pathways associated with each cluster.Comment: BMC Bioinformatics, In pres

    Estimating mutual information and multi--information in large networks

    Full text link
    We address the practical problems of estimating the information relations that characterize large networks. Building on methods developed for analysis of the neural code, we show that reliable estimates of mutual information can be obtained with manageable computational effort. The same methods allow estimation of higher order, multi--information terms. These ideas are illustrated by analyses of gene expression, financial markets, and consumer preferences. In each case, information theoretic measures correlate with independent, intuitive measures of the underlying structures in the system

    Cell non-autonomous interactions during non-immune stromal progression in the breast tumor microenvironment

    Get PDF
    Summary The breast tumor microenvironment of primary and metastatic sites is a complex milieu of differing cell populations, consisting of tumor cells and the surrounding stroma. Despite recent progress in delineating the immune component of the stroma, the genomic expression landscape of the non-immune stroma (NIS) population and their role in mediating cancer progression and informing effective therapies are not well understood. Here we obtained 52 cell-sorted NIS and epithelial tissue samples across 37 patients from i) normal breast, ii) normal breast adjacent to primary tumor, iii) primary tumor, and iv) metastatic tumor sites. Deep RNA-seq revealed diverging gene expression profiles as the NIS evolves from normal to metastatic tumor tissue, with intra-patient normal-primary variation comparable to inter-patient variation. Significant expression changes between normal and adjacent normal tissue support the notion of a cancer field effect, but extended out to the NIS. Most differentially expressed protein-coding genes and lncRNAs were found to be associated with pattern formation, embryogenesis, and the epithelial-mesenchymal transition. We validated the protein expression changes of a novel candidate gene, C2orf88, by immunohistochemistry staining of representative tissues. Significant mutual information between epithelial ligand and NIS receptor gene expression, across primary and metastatic tissue, suggests a unidirectional model of molecular signaling between the two tissues. Furthermore, survival analyses of 827 luminal breast tumor samples demonstrated the predictive power of the NIS gene expression to inform clinical outcomes. Together, these results highlight the evolution of NIS gene expression in breast tumors and suggest novel therapeutic strategies targeting the microenvironment

    Absence of central tolerance in Aire-deficient mice synergizes with immune-checkpoint inhibition to enhance antitumor responses.

    Get PDF
    The endogenous anti-tumor responses are limited in part by the absence of tumor-reactive T cells, an inevitable consequence of thymic central tolerance mechanisms ensuring prevention of autoimmunity. Here we show that tumor rejection induced by immune checkpoint blockade is significantly enhanced in Aire-deficient mice, the epitome of central tolerance breakdown. The observed synergy in tumor rejection extended to different tumor models, was accompanied by increased numbers of activated T cells expressing high levels of Gzma, Gzmb, Perforin, Cxcr3, and increased intratumoural levels of Cxcl9 and Cxcl10 compared to wild-type mice. Consistent with Aire's central role in T cell repertoire selection, single cell TCR sequencing unveiled expansion of several clones with high tumor reactivity. The data suggest that breakdown in central tolerance synergizes with immune checkpoint blockade in enhancing anti-tumor immunity and may serve as a model to unmask novel anti-tumor therapies including anti-tumor TCRs, normally purged during central tolerance

    A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity.

    Get PDF
    T cell receptor (TCR) antigen-specific recognition is essential for the adaptive immune system. However, building a TCR-antigen interaction map has been challenging due to the staggering diversity of TCRs and antigens. Accordingly, highly multiplexed dextramer-TCR binding assays have been recently developed, but the utility of the ensuing large datasets is limited by the lack of robust computational methods for normalization and interpretation. Here, we present a computational framework comprising a novel method, ICON (Integrative COntext-specific Normalization), for identifying reliable TCR-pMHC (peptide-major histocompatibility complex) interactions and a neural network-based classifier TCRAI that outperforms other state-of-the-art methods for TCR-antigen specificity prediction. We further demonstrated that by combining ICON and TCRAI, we are able to discover novel subgroups of TCRs that bind to a given pMHC via different mechanisms. Our framework facilitates the identification and understanding of TCR-antigen-specific interactions for basic immunological research and clinical immune monitoring

    Fine-scale detection of population-specific linkage disequilibrium using haplotype entropy in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The creation of a coherent genomic map of recent selection is one of the greatest challenges towards a better understanding of human evolution and the identification of functional genetic variants. Several methods have been proposed to detect linkage disequilibrium (LD), which is indicative of natural selection, from genome-wide profiles of common genetic variations but are designed for large regions.</p> <p>Results</p> <p>To find population-specific LD within small regions, we have devised an entropy-based method that utilizes differences in haplotype frequency between populations. The method has the advantages of incorporating multilocus association, conciliation with low allele frequencies, and independence from allele polarity, which are ideal for short haplotype analysis. The comparison of HapMap SNPs data from African and Caucasian populations with a median resolution size of ~23 kb gave us novel candidates as well as known selection targets. Enrichment analysis for the yielded genes showed associations with diverse diseases such as cardiovascular, immunological, neurological, and skeletal and muscular diseases. A possible scenario for a selective force is discussed. In addition, we have developed a web interface (ENIGMA, available at <url>http://gibk21.bse.kyutech.ac.jp/ENIGMA/index.html</url>), which allows researchers to query their regions of interest for population-specific LD.</p> <p>Conclusion</p> <p>The haplotype entropy method is powerful for detecting population-specific LD embedded in short regions and should contribute to further studies aiming to decipher the evolutionary histories of modern humans.</p
    corecore